OPEN 3 ACCESS Freely available online 



•0-PLOS I ONE 



Network-Based Study Reveals Potential Infection /S\ 
Pathways of Hepatitis-C Leading to Various Diseases qssxp 

Anirban Mukhopadhyay 1 *, Ujjwal Maulik 2 

1 Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India, 2 Department of Computer Science and Engineering, Jadavpur 
University, Kolkata, West Bengal, India 

Abstract 

Protein-protein interaction network-based study of viral pathogenesis has been gaining popularity among computational 
biologists in recent days. In the present study we attempt to investigate the possible pathways of hepatitis-C virus (HCV) 
infection by integrating the HCV-human interaction network, human protein interactome and human genetic disease 
association network. We have proposed quasi-biclique and quasi-clique mining algorithms to integrate these three 
networks to identify infection gateway host proteins and possible pathways of HCV pathogenesis leading to various 
diseases. Integrated study of three networks, namely HCV-human interaction network, human protein interaction network, 
and human proteins-disease association network reveals potential pathways of infection by the HCV that lead to various 
diseases including cancers. The gateway proteins have been found to be biologically coherent and have high degrees in 
human interactome compared to the other virus-targeted proteins. The analyses done in this study provide possible targets 
for more effective anti-hepatitis-C therapeutic involvement. 



Citation: Mukhopadhyay A, Maulik U (2014) Network-Based Study Reveals Potential Infection Pathways of Hepatitis-C Leading to Various Diseases. PLoS ONE 9(4): 
e94029. doi:1 0.1 371/journal.pone.0094029 

Editor: Wenzhe Ho, Temple University School of Medicine, United States of America 
Received November 4, 2013; Accepted March 11, 2014; Published April 17, 2014 

Copyright: © 2014 Mukhopadhyay, Maulik. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which 
permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 

Funding: The authors have no support or funding to report. 

Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: anirban@klyuniv.ac.in 



Introduction 

Hepatitis-C virus (HCV) causes the infectious disease Hepatitis- 
C which primarily affects the liver. It is important to identify the 
potential target human proteins that lead to different diseases 
caused by hepatitis-C virus infection. Analyzing the regulation 
between viral and host proteins in different organisms helps to 
uncover the underlying mechanism of various viral diseases. 
Protein-protein interaction (PPI) information provides a local as 
well as a global view of the interaction modules of proteins 
participating in similar biological activities. Such interaction 
information can be obtained via biological experiments or can 
be predicted using computational approaches [1]. Among the 
experimental methods, yeast two-hybrid (Y2H) screens have been 
widely used by the biologists. The Y2H system can detect both 
transient and stable interactions. The works in [2] and [3] deal 
with the identification of PPIs in Saccharomyces cerevisiae using yeast 
two-hybrid screens. The Y2H approach has also been utilized in 
the analysis of human PPIs in some earlier studies [4,5] . Another 
popularly used experimental method in the context of PPI is mass 
spectrometry which is used to identify the components of protein 
complexes. Use of mass spectrometry method for detecting PPIs 
can be found in [6,7]. 

One of the main goals in research of PPI is to predict possible 
viral-host interactions. This interaction information can be utilized 
to identify and prioritize the important viral-host interactions. This 
is specifically aimed at assisting drug developers targeting protein 
interactions for the development of specially designed small 
molecules to inhibit potential HCV-Human PPIs. Targeting 
protein-protein interactions has relatively recendy been established 



to be a promising alternative to the conventional approach to drug 
design [8,9]. 

Although there have been many studies on determining and 
analyzing PPIs in a single organism, not much work can be found 
on computational analysis of viral-host interactions. In very recent 
times, some computational analysis of viral-host interactions, 
specially in HIV- 1 -human PPIs [10-15] have been done. Some 
recent studies have analyzed the viral-host interactions for some 
individual HCV proteins. For example, in [16], a study on NS2 
protein of HCV is conducted and its role in HCV life cycle is 
discussed. In [17], the interactions of HCV proteins CORE and 
NS4B with human proteins have been analyzed for understanding 
the biological context in HCV pathogenesis. In [18], the authors 
have revealed that the HCV protein NS2 interacts with different 
structural and non-structural proteins for virus assembly. In 
another work [19], an integrative network analysis is performed to 
identify key genes and pathways in the progression of hepatitis C 
virus induced hepatocellular carcinoma. However, no global 
system-wide study based on the HCV-human interaction network 
is available in literature. Motivated by this, in the present work, the 
PPI records between HCV proteins and human (Homo sapiens] 
proteins reported in a recendy published dataset [20] are collected. 
This interaction information, all together, can be visualized as a 
bipartite graph, where two sets of nodes denote HCV proteins and 
human proteins, respectively, and the edges denote the interac- 
tions. In this work, the bipartite network is mined to identify the 
strong interacting modules, which are effectively quasi-bicliques. 
We further extend the study by clustering the human protein- 
protein interaction network to identify the possible quasi-cliques 
that overlap with the quasi-bicliques identified in the previous step. 
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The human proteins participating in these quasi-cliques are 
considered as gateways of infection and are further investigated for 
their functional characteristics. Subsequently, the bipartite net- 
work representing the association of human proteins with various 
disease types is mined to find possible quasi-bicliques that overlap 
with the gateway proteins discovered in the previous stage. Thus 
we explore three networks, namely, HCV-human interaction 
network, human protein interaction network, and human proteins- 
disease association network globally to discover the potential 
pathways of infection by the HCV viruses that lead to various 
diseases including cancers. The analyses done in this study may 
provide possible targets for more effective anti-hepatitis-C 
therapeutic involvement. 

Materials and Methods 

In the present study, three different networks are mined. First 
one is the HCV-human protein interaction network. This network 
is modeled as a bipartite graph with two sets of nodes, one set 
corresponding to the HCV proteins and the other set correspond- 
ing to the human proteins. The edges represent presence of 
interactions between the corresponding HCV and human 
proteins. The second network is human protein interaction 
network, which is modeled as a graph. Nodes represent the 
human proteins and the edges represent interactions among them. 
The third network represents the associations between human 
proteins and disease. Hence this disease association network is also 
modeled as a bipartite graph with two sets of nodes representing 
human proteins and diseases, respectively. The edges of this graph 
represent the association of the human proteins with diseases. 

Before describing the proposed methods, here we first define a 
few terms to help subsequent discussions [21,22]. 

Definition 1 (Graph). The term graph is used throughout to 
denote an unweighted and undirected simple graph (without self- 
loops or parallel edges) G = ( V,E), where V and E are the vertex 
and edge sets, respectively. Here E is represented as a set of vertex- 
pairs, i.e., E= {(u,v)\u,v e V}. 

Definition 2 (Degree of a Vertex). The degree of a vertex v h 
denoted as rf(»;) in a graph, is said to be the number of edges 
incident to it. Hence v, = |{(v,-,v 7 -) e E,Vj # v,}|. 

A graph G = (V,E) may contain subgraphs. A clique is a 
complete subgraph of a graph. 

Definition 3 (Clique). A subgraph G = ( V,E) is said to be a 
clique if for each vertex pair u,v e V, there is an edge (u,v). 

As can be seen, the edge set E of a clique can readily be 
obtained from the vertex set V, and therefore a clique may be 
simply denoted as G= V. 

Definition 4 (y-quasi-clique). In a graph G = {V,E), a 
subgraph G = (V',E'), V'^V, E'^E, is said to be a y-quasi- 
clique (0<y<l) if the subgraph induced by this set of vertices 
contains at least Pa'^'CiI edges. 

We denote the cardinality of a vertex set V as | V\ . A graph is 
bipartite if its vertex set can be distinguished into a pair of 
partitions. It is formally defined as follows. 

Definition 5 (Bipartite graph). A graph G = ( V,E) is said to be 
bipartite if its vertex set V can be partitioned into two nonempty 
and disjoint sets V\ and V 2 such that E = {(u,v)\u e V\,v e Vj\. 

Therefore, a bipartite graph G = (V,E) can also be represented 
as G = (V\,V2,E). As the graphs may have subgraphs, bipartite 
graphs may also contain subgraphs. A biclique is a complete 
bipartite subgraph. 

Definition 6 (Biclique). A bipartite subgraph G = (V\,V2,E) is 
said to be a biclique if for each vertex pair ue V\ and ve V2, there is 
an edge (u,v). 



As can be seen, the edge set £ of a biclique can be readily 
obtained from the two vertex sets V\,Vi t and therefore a biclique 
may be simply denoted as G = (V\, V2). 

Definition 7 (y-quasi-biclique). In a bipartite graph 
G = {V\,V2,E), a bipartite subgraph G = ( V[, V{,E'), V[^V U 
Vi^V2, E'^E, is said to be a y-quasi-biclique (0<y<l) if the 
subgraph induced by these two sets of vertices contains at least 
[y.\Vi\.\Vi\l edges. 

The proposed study consists of three stages. First we mine strong 
y-quasi-bicliques from the first bipartite graph that represents the 
interactions between viral and human proteins. The obtained 
quasi-bicliques are strong interaction modules consisting of the 
HCV and human proteins. Thereafter, in the second stage we 
cluster the human protein-protein interaction network to identify 
the possible strong y-quasi-cliques that overlap with the quasi- 
bicliques identified in the previous step. The human proteins 
participating in these quasi-cliques are considered as gateways of 
infection and are further investigated for their functional 
characteristics. Subsequently, the bipartite network representing 
the association of human proteins with various disease types is 
mined to find possible strong y-quasi-bicliques that overlap with 
the gateway proteins discovered in the previous stage. Hence we 
explore three networks, namely, HCV-human interaction net- 
work, human protein interaction network, and human proteins- 
disease association network globally to discover the potential 
pathways of infection by the HCV viruses that lead to various 
diseases including cancers. Fig. 1 diagrammatically demonstrates 
the study conducted in this article. 

In this article we have proposed an algorithm based on 
hierarchical clustering that can mine both y-quasi-cliques and y- 
quasi-bicliques from graphs and bipartite graphs, respectively. The 
algorithm is basically a quasi-clique mining algorithm, however, 
with a litde modification, this can also be used to mine quasi- 
bicliques as well. First we describe the algorithm for mining quasi- 
cliques from a graph. Thereafter, how this algorithm is modified to 
mine quasi-bicliques is described below. 

Mining y-Quasi-Cliques 

The proposed algorithm for mining y-quasi-cliques is based on 
hierarchical average linkage clustering method [23,24]. Given an 
input graph G = ( V,E), first the shortest path distances (number of 
edges) between all pairs of vertices are computed. Thereafter the 
dendrogram is built using agglomerative average linkage method. 
In this method, first a cluster is formed corresponding to each 
vertex of the graph. Thereafter two nearest vertices as per shortest 
path distance are combined to form a new cluster. This continues 
until there remains only one cluster containing all the vertices. The 
distance between any two cluster is computed as the average 
distance between all the vertices in the two clusters. The tree 
representing the hierarchical relationships among the clusters 
formed in this way is called the dendrogram. 

After building the dendrogram, we start scanning from the top 
of the dendrogram to the bottom, one step at a time. Every time a 
cluster is divided into two, we examine the two clusters whether 
they are y-quasi-cliques given a y value. If any cluster satisfies this 
criterion, we do not further divide that cluster, i.e., the subtree 
rooted by this cluster is no more explored and this cluster is 
returned as one y-quasi-clique. The clusters that are not y-quasi- 
cliques are recursively divided as per the dendrogram until they 
provide some y-quasi-clique, or reaches the threshold of quasi- 
clique size (minimum number of vertices to be present in the quasi- 
clique). Hence, the algorithm returns a set of maximal y-quasi- 
cliques, i.e., the y-quasi-cliques which are not completely included 
in another y-quasi-clique. 
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Figure 1 . The diagrammatic representation of the proposed study. The orange circles represent the HCV proteins. The blue circles represent 
the human proteins. The pink circles represent the diseases. The green edges represent the interaction between HCV proteins and human proteins. 
The black edges represent the interactions among human proteins. The violet edges represent the associations between human proteins and 
diseases. The quasi-bicliques and bicliques are shown also. The quasi-biclique in the HCV-human bipartite network overlaps with the quasi-clique in 
the human protein interaction network. The quasi-clique in the human protein interaction network overlaps with the quasi-biclique in the human 
protein-disease association network. 
doi:1 0.1 371 /journal.pone.0094029.g001 



Mining y-Quasi-Bicliques 

The algorithm for mining y-quasi-bicliques, which are equiv- 
alent to biclusters [25], is exactly same as mining y-quasi-cliques, 
the only modification is done in the distance matrix. In this case 
also, we compute the shortest path between the nodes in the input 
bipartite graph G = {V\,V2,E). Note that here the distance 
between two vertices ueV\ and v e V% can be any odd value a 
1, since u and v may not be directly connected, but there may be a 
path between this two that contains a number of vertices from V\ 
and V<i in alternative positions. Any two vertices U\,U2 £ V\ are 
never connected directly in a bipartite graph, however they may 
be connected through a set of vertices from and V\ in an 
alternative fashion, and thus the distance between any two vertices 
in V\ is always an even value &2. Similar is the case for any two 
vertices in set V 2 . 

In our study, The number of HCV proteins (set V{j is far more 
less than the number of human proteins (set F<>). Therefore to 
increase the participation of HCV proteins in the y-quasi- 
bicliques, we have modified the distance function between two 
viral proteins. In the modified version, the distance between any 
two viral proteins that are connected by a series of alternative 
human and viral proteins, i.e., which belong to the same 
connected component in the bipartite graph, is made 1. Thus 
the viral proteins that belong to the same connected component 
come closer to each other virtually and the number of viral 
proteins in the y-quasi-cliques increases. The similar approach is 
adopted while finding the quasi-bicliques between the human 
proteins and diseases to increase the participation of the human 
proteins. 

Databases and Preprocessing 

As stated before, we deal with three networks, namely, HCV- 
human PPI network, human PPI network and human protein- 



disease association network. In this section, the collection and 
preprocessing of the datasets have been described below. 

HCV-Human Protein Interaction Database 

The protein interaction information between the HCV proteins 
and human proteins have been collected from a recendy 
developed HCV-human protein interaction database called 
HCVpro [20] publicly available at http://cbrc.kaust.edu.sa/ 
hcvpro/. This viral-host PPI database has been manually curated 
and it stores only those HCV-human PPIs that pass through a very 
strict filtering process [20] . Hence this repository maintains a very 
high-quality PPI information. It can be noted that there is another 
well-known and widely used database of hepatitis C-human 
protein interactions which is available at [26]. However, we found 
that the HCVpro database covers ~94% of the interactions 
present in that database. Therefore we decided to use the newer 
database HCVpro. The HCVpro database contains the interac- 
tions among 11 HCV proteins (CORE, El, E2, F, NS2, NS3, 
NS4A, NS4B, NS5A, NS5B, p7) and 455 human proteins. The 
total number of interactions is 549. The interactions are given in 
File SI. Fig. 2 shows the distribution of the interactions with 
respect to each of the HCV proteins. It is evident from the figure 
that the HCV protein NS3 interacts with maximum number of 
human proteins (218), whereas NS2 is found to interact with 
minimum number of human proteins (8). Among the other HCV 
proteins, NS5A and CORE have reasonable number of interac- 
tions with the human proteins (115 and 94, respectively). After 
removing the redundant interactions, the number of unique 
interactions reduces to 524. These 524 interactions among 1 1 
HCV proteins and 455 human proteins are used for preparing the 
bipartite network between viral and host proteins and the maximal 
y-quasi-bicliques are mined from this bipartite network as 
described in the previous section. 
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Human Protein Interaction Database 

The primary objective of mining human protein interaction 
database is to find y-quasi-cliques that overlap with the y-quasi- 
bicliques identified in the previous stage of the study. Hence to 
avoid huge computational complexity in mining quasi-cliques 
from the complete human protein interaction database, we 
concentrate only on the part of the human PPI that contains the 
human proteins present in the identified y-quasi-bicliques in the 
previous stage. For this, the function protein association network 
STRING (http://string-db.org/) has been utilized. For each 
quasi-biclique identified in the previous stage, the participating 
human proteins are given as input to STRING and STRING 
generates an interactome containing these human proteins and 
other additional human proteins. We consider the predictions 
based on co-expression, experiments and databases only. We 
consider only the interactions with confidence of at least 0.8 (in a 
confidence scale between 0 and 1). This ensures that we consider 
only those PPIs that have reasonable number of evidences in 
literature. Maximum number of interactions per protein is set to 
100. From the resultant PPI, the y-quasi-clique mining algorithm 
described in previous section is applied to obtain any quasi-clique 
that overlaps the previously mined quasi-biclique on which the 
present human PPI has been built. 

Human Protein-Disease Association Database 

The Genetic Disease Association Database [27] (http:// 
geneticassociationdb.nih.gov/) archives the human genetic associ- 
ation studies on various types of complex diseases and disorders. 
The database contains summary data extracted from published 
articles in peer reviewed journals on candidate gene and GWAS 
studies. The database contains both positive (if the gene/protein is 
known to have association with the phenotype) and negative (if a 
gene/protein is known to have lack of association with the 
phenotype) associations, and also unknown (no specific informa- 
tion) associations. The network has been given in File S3. All the 
gene-disease association information have been downloaded from 
the database and the associations other than positive ones are 



filtered out. We found approximately 4200 unique diseases which 
are associated with approximately 3600 human genes/proteins, 
resulting approximately a total of 12400 unique gene-disease 
associations. In Fig. 3, we have demonstrated the distributions of 
associations with respect to both diseases and genes. In both cases, 
it can be noticed that only few diseases have association with many 
human proteins, but most of the diseases are associated with only a 
few human proteins. The density of this bipartite network in 
~0.0007 only, which indicates the sparseness of the network. The 
human proteins belonging to the quasi-cliques identified in the 
previous stage are considered and the bipartite network with these 
human proteins and diseases connected to them is formed. 
Thereafter, the y-quasi-biclique mining algorithm is applied to this 
bipartite network to obtain the strong maximal quasi-bicliques 
from this network. 

Results and Discussion 

In this section, we discuss the results of the proposed study. 

Mining Quasi-Bicliques in HCV-Human Protein Interaction 
Network 

First we apply the proposed y-quasi-biclique mining algorithm 
on the HCV-human protein interaction network collected from 
HCVpro. The value of y has been set to 0.5. This is done as 
follows. We varied y value from 0. 1 to 0.9 with step size 0. 1 and 
varied the minimum number of HCV proteins present in a quasi- 
biclique n from 2 to 5 with step size 1. For each combination of y 
and n the algorithm is executed. In each case, the statistical 
significance of the set of resultant quasi-bicliques (if found) is 
investigated. To test the statistical significance of a quasi-biclique 
of size xxy, the bipartite graph is perturbed randomly 10,000 times 
(without changing the degrees of HCV proteins) and a quasi- 
biclique of size xxy is picked up randomly from the perturbed 
graph. Then we conduct the Wilcoxon ranksum test to find 
whether the density of the actual quasi-biclique is significandy 
better than the mean density of the random quasi-bicliques of 




Figure 2. Distribution of interactions in the Hepatitis-C-Human bipartite interaction network with respect to the 1 1 HCV proteins. 

The HCV protein NS3 interacts with maximum number of human proteins (21 8), whereas NS2 is found to interact with minimum number of human 
proteins (8). Among the other HCV proteins, NS5A and CORE have reasonable number of interactions with the human proteins (115 and 94, 
respectively). 

doi:1 0.1 371 /journal.pone.0094029.g002 
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Figure 3. Distribution of associations in the human gene-disease association network. The left hand side figure shows the distribution of 
associations with respect to all the disease. The right hand side figure shows the distribution of associations with respect to all the genes. 
doi:1 0.1 371 /journal.pone.0094029.g003 



same size. This returns a p-value and lower the p-value more 
significant is the quasi-biclique under consideration. For a 
combination of y and n value, the average p-value over all the 
quasi-bicliques obtained is computed and we found that for y = 0.5 
and n — 3 the average p-value is minimum. Hence we set the y 
value to 0.5 and quasi-bicliques having at least three HCV 
proteins (n = 3) are considered only. This results in two quasi- 
bicliques QBl and QBl, respectively. Different statistics about the 
two quasi-bicliques found are reported in Table 1. The densities 
(i.e., ratio of the maximum number of interactions present in the 
quasi-biclique to the maximum possible number of interactions) of 
the two quasi-bicliques obtained are 0.6786 and 0.5400, respec- 
tively. The first quasi-biclique consists of the HCV proteins 
CORE, NS3 and NS5A and 28 human proteins. Note that these 
three HCV proteins are the top three highest degree HCV 
proteins in the network. The other quasi-biclique consists of five 
HCV proteins El, E2, NS2, NS4A and NS5B and 10 human 
proteins. 

Mining Quasi-Cliques in Human Protein Interaction 
Network 

In the next stage, as discussed before, the human proteins 
participating in the quasi-bicliques are given as the input to the 
STRING database. The human proteins involved in the first 
quasi-biclique QBl (Table 1) are first given to the STRING 
database with the parameter setting described in Section. This 
induces a human interactome consisting of 120 human proteins 
(Fig. 4 shows the interactome). Although this network is very 
sparse (density ~0.07), a few denser regions are clearly visible from 
the figure. After applying the quasi-clique mining algorithm 
described before. The y value is fixed to 0.6 and the minimum 
number of nodes allowed is set to 4. We obtained 9 dense quasi- 
cliques from the interactome. Out of these 9 quasi-cliques, 5 have 
overlaps with the first quasi-biclique discovered in the previous 
stage. Different statistics of these 5 quasi-cliques are shown in 
Table 2. 

After application of the quasi-clique finding algorithm on the 
interactome induced by the second quasi-biclique QBl of Table 1, 
it provides 4 quasi-cliques that overlap this quasi-biclique. The 
interactome induced by the second quasi-biclique consists of 79 
human proteins (This interactome has been shown in Fig. 5). This 
network has density of ~0.22. However, here also, a few denser 



regions can be noticed from the figure. The 4 quasi-cliques as 
found by the algorithm have been reported in Table 3. It is evident 
from the table that these quasi-cliques overlap with the second 
quasi-biclique on only one human protein each. Both the human 
interactomes induced by quasi-bicliques QBl and QBl are 
reported in File S2. All the quasi-bicliques and quasi-cliques are 
reported in File S4. 

GO and Pathway Analyses of Quasi-Cliques 

Subsequendy we further analyze the quasi-cliques found 
(Tables 2 and 3) using Gene Ontology (GO) and pathway based 
studies. Let us denote the 9 quasi-cliques of Table 2 and 3 by 
{QCI,QC1, . . . ,QC9} respectively. For the GO and pathway 
analyses, the web-based tool DAVID (http://david.abcc.ncifcrf. 
gov/) has been used. Table 4 shows the top few significant GO 
and KEGG pathway terms for the 9 quasi-cliques along with the 
significance p-values. It is evident from the table that for all the 
quasi-cliques have significant GO and KEGG pathways associated 
with them, with one exception for QC1 for which no significant 
KEGG pathway has been found. QCl mainly consists of the 
proteins that function in negative regulation of ubiquitin and 
participate in proteasome complex whose main function is to 
degrade unneeded or damaged proteins by proteolysis, a chemical 
reaction that breaks peptide bonds. The relationship between 
ubiquitin, proteasome and hepatitis-c have already been reported 
in literature [28,29] which involves HCV protein CORE. It may 
be noticed that the HCV CORE protein belongs to the first quasi- 
biclique {QBl in Table 1, that has overlaps with the quasi-clique 
QCl. The overlap between QBl and QCl consists of two human 
proteins PSMB9 and PSME3 and thus they may be considered as 
possible infection gateway by the HCV proteins CORE (interacts 
with PSME3), NS3 (interacts with PSMB9) and NS5A (interacts 
with PSMB9) which belong to quasi-biclique QBl, for attacking 
the proteasome complex. 

The quasi-clique QCl contains 14 human proteins mostly 
involved in apoptosis and programmed cell death. Also it is 
interesting that a significant GO-CC term for these proteins is 
death-inducing signaling complex. Further, these proteins also 
participate in the KEGG pathway apoptosis as well as pathways in 
cancer. These evidences suggest strongly that the human proteins 
involved in this quasi-clique have direct or indirect relationship to 
cancer diseases. The quasi-biclique QBl (involving the viral 
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Table 1. Quasi-bicliques found from HCV-human protein interaction database. 



Quasi-biclique 


HCV proteins 


Human proteins 


Density 


Qfil 


Count: 3 


Count: 28 






CORE, NS3, NS5A 


EFEMP1, EIF2AK2, FBLN2, 

FBLN5, FTH1, HIVEP2, HNRNPK, JAK1, 

KPNA1, LTBP4, MAGED1, NAP1L1, NAP1L2, 

PSMB9, PSME3, RNF31, SMAD3, 

ST ATI, STAT3, TBP, TLR2, TP53, TP53BP2, 

TRADD, TRAF2, TXNDC11, VIM, VWF 


0.6786 


QB2 


Count: 5 


Count: 10 






El, E2, NS2, NS4A, NS5B 


CALR, CANX, CD209, CLEC4M, 

HOXD8, HSPA5, LTF, NR4A1, SETD2, UBQLN1 


0.5400 



The HCV proteins and human proteins involved in the quasi-bicliques are reported along with the densities of the quasi-bicliques. 
doi:l 0.1 371 /joumal.pone.0094029.t001 



proteins CORE, NS5A and NS3) overlaps with QC2 on three 
human proteins TRADD (interacts with CORE and NS5A), 
TRAF2 (interacts with CORE and NS5A) and VIM (interacts 
with CORE and NS3). This suggests that attack by HCV proteins 
CORE, NS5A and NS3 may lead to cancer through apoptosis and 
the main gateway host proteins responsible for that are TRADD, 
TRAF2 and VIM. 

The 23 host proteins in quasi-clique QC3 are mainly 
transcription factors (Table 4). Although the quasi-biclique QBl 
only overlaps with QC3 on two host proteins HNRNPK and TBP, 



it suggests that the viral proteins in QBl may indirectly interact 
with many transcription factor proteins and thus may cause their 
malfunctioning. This may lead to breakdown of the overall setup 
of normal regulatory roles of these transcription factors causing 
serious infectious behavior. 

Most of the host proteins in the quasi-clique QCA negatively 
regulate transcription and participate in enzyme binding. It can be 
noticed that many of these proteins are part of PML bodies, which 
is a class of nuclear body and they react against SP100 auto- 
antibodies (PML, promyelocytic leukemia). This is in fact also 




Figure 4. Human protein interactome induced by first quasi-biclique QB1. The interactome consists of 120 human proteins and 509 
interactions among them. The density of the interactome is nearly 0.07. 
doi:1 0.1 371 /journal.pone.0094029.g004 
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Table 2. Quasi-cliq 
biclique of Table 1. 


lies found from human protein interactome that overlap with the human proteins involved in the first quasi- 




Quasi-cliq ue 


Human proteins Density 


Overlapping proteins with first 
quasi-clique 


QCI 


Count: 8 






POMP, PSMA2, PSMB10, PSMB7, PSMB8, PSMB9, PSME3, RFWD2 0.6786 


PSMB9, PSME3 


QC2 


Count: 14 






BIRC2, BIRC3, CASP8, FADD, GATA5, MAP3K5, 0.6484 
RIPK1, TNFRSF1A, TNFRSF1B, TRADD, 
TRAF1, TRAF2, UBC, VIM 


TRADD, TRAF2, VIM 


QC3 


Count: 23 






CIP, EDF1, GTF2A1, GTF2A2, GTF2B, GTF2E1, 0.6324 

GTF2F1, HNRNPK, MYST1, SETD7, SF3A2, 

TAF1, TAF10, TAF11, TAF12, TAF13, TAF2, TAF2E, 

TAF3, TAF4, TAF5, TAF7, TBP 


HNRNPK, TBP 


QC4 


Count: 8 






HDAC1, HIPK2, MDM2, MDM4, SUM01, TP53, UBE2I, USp7 0.6429 


TP53 


QC5 


Count: 8 






EGFR, IL6ST, JAK1, PIAS3, SRC, ST ATI , STAT2, STAT3 0.7143 


JAK1, ST ATI, STAT3 


The human proteins involved in the quasi-cliques are reported along with the densities of the quasi-cliques and the overlapping human proteins with the first quasi- 
biclique. 

doi:1 0.1 371 /joumal.pone.0094029.t002 




Figure 5. Human protein interactome induced by second quasi-biclique QB2. The interactome consists of 79 human proteins and 693 
interactions among them. The density of the interactome is nearly 0.22. 
doi:1 0.1 371 /journal.pone.0094029.g005 
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Table 3. Quasi-cliques found from human protein interactome that overlap with the human proteins involved in the second 
quasi-biclique of Table 1. 



Quasi-clique 


Human proteins 


Density 


Overlapping proteins with second 
quasi-biclique 


QC6 


Count: 4 








PLOD1, PLOD2, PLOD3, SETD2 


0.8333 


SETD2 


QC7 


Count: 5 








NBL1, PSMD4, UBA52, UBC, UBQLN1 


0.7000 


UBQLN1 


QC8 


Count: 45 








BCL2, CD3D, CREBBP, EP300, ESR1, ESR2, 
ESRRA, ESRRB, ESRRG, FOSB, GNG2, HNF4A, 
HNF4G, MAPK7, MEF2D, NFATC2, NR0B2, 
NR1D1, NR1D2, NR1H2, NR2C1, NR2C2, 
NR2C2AP, NR2E1, NR2F1, NR2F6, NR4A1, 
NR4A2, NR5A1, NRBP1, POMC, PPARA, PPARD, 
PPARG, RARA, RARB, RARG, RORA, RORB, 
RORC, RXRA, RXRG, THRA, THRB, VDR 


0.6364 


NR4A1 


QC9 


Count: 5 








APOB, C1QA, C1QB, C1QC, CALR 


0.7000 


CALR 


The human proteins involved in the quasi-cliques are reported along with the densities of the q 
biclique. 

doi:1 0.1 371 /joumal.pone.0094029.t003 


uasi-cliques 


and the overlapping human proteins with the second quasi- 



evident from the pathway analysis which finds two significant 
KEGG pathways, namely p53 signaling pathway and chronic 
myeloid leukemia. For the quasi-biclique QBl the viral gateway to 
these host proteins is TP53, a membrane protein that is common 
for QBl and QC4. Noticeably, all the viral proteins of QBl, i.e., 
CORE, NS5A and NS3 interact with TP53 to get entrance. This 
infection may ultimately lead to chronic myeloid leukemia [30] . 

The quasi-clique QC5 contains host proteins with mainly kinase 
activities. Two significant KEGG pathways namely JAK-STAT 
signaling pathway and pancreatic cancer, have been identified in 
this quasi-clique. This suggests that the HCV proteins in QBl 
interact with the host proteins in QC5 through the common host 
proteins JAK 1 , STAT1 and STAT3 leading to pancreatic cancer. 
Moreover, JAK-STAT system transmits information from chem- 
ical signals outside the cell, through the cell membrane. Therefore 
the proteins involved in QC5 are possibly involved in transferring 
and propagating the infection to the other cells. A study in [3 1] has 
already established the involvement of HCV in JAK-STAT 
signaling pathway. 

The quasi-cliques QC6 through QC9 (Table 3) overlap with the 
quasi-biclique QBl, which consists of 5 viral proteins El, E2, NS2, 
NS4A, and NS5B and 10 host proteins. QBl overlaps with QC6 
with the host protein SETD2. The most significant GO terms 
associated with the human proteins in QC6 in BP, MF and CC 
categories are oxidation reduction, procollagen-lysine 5-dioxygen- 
ase activity and endoplasmic reticulum, respectively. The most 
significant KEGG pathway associated with these proteins is Lysine 
degradation, where all the 4 proteins in QC6 are involved. The 
association of HCV NS2 protein and lysine degradation is also 
reported in [32]. 

QC1 overlaps QBl with the host protein UBQLN1. QC1 also 
has proteasomal acitivities QCl, and as discussed before the host 
proteins in this functional module are involved in hepatitis C 
infection. However, we could not find any significant pathway for 
QCl. 

QC& is the largest quasi-clique that we have found in the 
present study. This functional module consists of 45 host proteins 
which are mostly transcription factors. The infection gateway to 



this module is NR4A1, which is the only common host protein for 
QBl and QC8. Interestingly, all the five viral proteins in QBl 
interact with NR4A1, and the CORE protein, which is a part of 
QBl also interacts with NR4A1. This observation suggests that 
NR4A1 serves as a very important gateway to this transcription 
factor complex. Any disturbance to this module for viral infection 
may lead to malfunctioning of normal gene regulatory network, 
and this in turn can result in various types of cancer (as the 
pathway study reveals). Our pathway study also reveals another 
significant pathway, namely PRAR signaling pathway, which is 
also shown to be associated with HCV infection in recent studies 
[33]. 

The quasi-clique QC9 that consists of 5 host proteins which 
have been found to be associated with protein maturation and 
humoral immune response mediated by circulating immunoglob- 
ulin. Thus these proteins are highly responsible for maintaining 
the immunity system inside human body. QBl and QC9 has one 
common host protein CALR, and hence this protein serves as a 
gateway of attack to the immunity system by HCV. The viral 
proteins El and E2 (envelop proteins), which are major players in 
all events required for virus entry into target cells interact with 
CALR and start attacking the immunity system. This may 
ultimately lead to many prion diseases (as revealed through 
pathway analysis). 

The GO and pathway analyses of the identified quasi-cliques in 
human protein interaction network reveals that the host proteins 
involved in these functional modules have high degree of 
functional similarities. Moreover, as discussed, HCV attacks that 
go through these quasi-cliques may lead malfunctioning of 
regulatory and immunity system in targeted cells and may lead 
to different types of disease including various types of cancers. 

Mining Quasi-Bicliques in Human Protein-Disease 
Association Network 

To study the disease association with the host proteins in the 
identified quasi-cliques for finding possible pathway of pathogen- 
esis leading to various diseases, we apply our quasi-biclique finding 
algorithm on the human gene-disease association network. Note 
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Table 4. The significant important GO terms and KEGG pathways found in the quasi-cliques. 





Quasi-clique 




Significant GO terms 




KEGG Pathway 




Biological Process 


Molecular Function 


Cellular Component 




QC1 


negative regulation of ubiquitin-protein 
ligase activity during mitotic cell cycle 


threonine-type endopeptidase 
activity 


proteasome complex 


Proteasome 




(p-value: 4.6e-11, 75%) 


(p-value: 6.1e-11, 62.5%) 


(p-value: 6.4e-14, 87.5%) 


(p-value: 3.1e-12 f 87.5%) 


QC2 


apoptosis 


death domain binding 


membrane raft 


Apoptosis 




(p-value: 8.9e-14, 85.7%) 


(p-value: 5.0e-3, 14.3%) 


(p-value: 3.9e-8, 42.9%) 


(p-value: 1.1e-10, 57.1%) 




programmed cell death 




death-inducing signaling 
complex 


pathways in cancer 




(p-value: 1.1e-13, 85.7%) 




(p-value: 5.5e-6, 21.4%) 


(p-value: 3.6e-4, 42.9%) 


QC3 


transcription initiation from RNA 
polymerase II promoter 


general RNA polymerase II 
transcription factor activity 


DNA-directed RNA 
polymerase II, holoenzyme 


Basal transcription factors 




(p-value: 6.0e-29, 71.4%) 


(p-value: 4.9e-20, 52.4%) 


(p-value: 4.1e-30, 76.2%) 


(p-value: 1.8e-29, 71.4%) 


QC4 


negative regulation of transcription 


enzyme binding 


PML body 


p53 signaling pathway 




(p-value: 1.0e-8, 87.5%) 


(p-value: 1.2e-3, 50.0%, ) 


(p-value: 1.6e-7, 50.0%) 


(p-value: 1.0e-3, 37.5%) 


Chronic myeloid leukemia 


(p-value: 1.3e-3, 37.5%) 


QC5 


protein kinase cascade 


protein tyrosine kinase activity 


dendrite 


Jak-STAT signaling pathway 




(p-value: 2.8e-9, 87.5%) 


(p-value: 3.3e-3, 37.5%) 


(p-value: 7.4e-2, 25.0%) 


(p-value: 4.9e-7, 75.0%) 


Pancreatic cancer 


(p-value: 9.1 e-5, 50.0%) 


0C6 


oxidation reduction 


procollagen-lysine 5-dioxygenase 
activity 


endoplasmic reticulum 


Lysine degradation 




(p-value: 1.0e-4, 100.0%) 


(p-value: 1.1 e-7, 75.0%) 


(p-value: 5.6e-3, 75.0%) 


(p-value: 6.0e-7, 100.0%) 


QC7 


anaphase-promoting complex-dependent 
proteasomal ubiquitin-dependent protein 
catabolic process 


structural constituent of 
ribosome 


cytosolic small ribosomal 
subunit 






(p-value: 2.3e-5, 60.0%) 


(p-value: 2.6e-2, 40.0%) 


(p-value: 1.2e-2, 40.0%) 




proteasome complex 


(p-value: 1.9e-2, 40.0%) 


QC8 


regulation of transcription, 
DNA-dependent 


steroid hormone receptor activity nuclear lumen 


Pathways in cancer 




(p-value: 4.3e-27, 84.4%) 


(p-value: 6.1e-75, 73.3%) 


(p-value: 1.4e-3, 20.0%) 


(p-value: 1.7e-5, 20.0%) 






transcription factor activity 


transcription factor complex 


PPAR signaling pathway 






(p-value: 7.7e-37, 84.4%) 


(p-value: 4.7e-3, 8.9%) 


(p-value: 1.3e-4, 11.1%) 


QC9 


protein maturation 


carbohydrate binding 


extracellular space 


Prion diseases 




(p-value: 2.8e-6, 80.0%) 


(p-value: 5.4e-2, 40.0%) 


(p-value: 5.9e-4, 80.0%) 


(p-value: 1.4e-4, 60.0%) 




humoral immune response mediated by 
circulating immunoglobulin 






Complement and coagulation 
cascades 




(p-value: 3.0e-5, 60.0%) 






(p-value: 5.4e-4, 60.0%) 


Quasi-clique 




Significant GO terms 




KEGG Pathway 




Biological Process 


Molecular Function 


Cellular Component 




QC1 


negative regulation of ubiquitin-protein 
ligase activity during mitotic cell cycle 


threonine-type endopeptidase 
activity 


proteasome complex 


Proteasome 




(p-value: 4.6e-11, 75%) 


(p-value: 6.1e-11, 62.5%) 


(p-value: 6.4e-14, 87.5%) 


(p-value: 3.1e-12, 87.5%) 


QC2 


apoptosis 


death domain binding 


membrane raft 


Apoptosis 



The significant terms are mentioned along with their significance p-values and percentage of proteins associated with each term. DAVID online tool has been used to 

perform the significance tests. 

doi:1 0.1 371 /journal.pone.0094029.t004 



that while finding the quasi-bicliques, we executed the quasi- 
biclique finding method on 9 different bipartite graphs, corre- 
sponding to the 9 quasi-cliques. Each of these graphs contain the 
human proteins from the corresponding quasi-clique, and all the 
diseases. The y value is set to 0.7, so that each identified quasi- 



biclique has density of at least 0.7. Out of the nine quasi-cliques, 
we found four quasi-cliques QC\, QC2, QC4 and QC& which 
have overlap with the obtained quasi-bicliques on protein-disease 
association networks. These quasi-bicliques, termed as QBDl, 
QBD2, QBD3, QBD4 are reported in Table 5. In each quasi- 
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biclique in human protein-disease association network, two human 
proteins have been found to overlap with the corresponding quasi- 
cliques. These proteins, thus can be considered as gateways to the 
diseases. QBD\ has overlap with QC\ with two proteins PSMB8 
and PSMB9 which are associated with five different diseases. 
QBD2 overlaps with QC2 with two host proteins TNFRSF1A and 
TNFRSF1B and these proteins are highly associated with 12 
diseases. The quasi-clique QC4 and the quasi-biclique QBD3 has 
two common proteins TP53 and MDM2 which are connected 
two 9 diseases including various types of cancer. Two proteins 
TGFR and MDM2 are common to QBD4 and gC8 and these 
proteins have association with 5 diseases which are mainly 
different cancer types. Interestingly MDM2 belongs to both 
QBD3 and QBD4. 

As is evident from Table 5, several diseases are associated to the 
four quasi-bicliques in human protein-disease association network. 
Among these, many of the diseases are already established to be 
related to HCV infection. Graves' disease is an autoimmune 
disease where the thyroid is overactive. It has been found recently 
that chronic HCV infection may lead to destructive thyroiditis 
followed by Graves' disease [34]. Diabetes (Type I and II) is a well- 
known disease to be associated with HCV attack [35,36]. 
Interferons are proteins that are released during the presence of 
viral particles in cells. It has been established recently that HCV 
infection suppresses the interferon response in the liver [37]. The 
relationship of Psoriasis, another autoimmune disease affecting 
skin, is also well-known [38]. We have also found malaria as one of 
the diseases in the quasi-bicliques. A recent study has revealed that 
HCV infection may lead to slower emergence of malaria parasite 
Plasmodium falciparum in blood [39] . Chron's disease is the condition 
of continuous inflammation of digestive track. Inflammatory bowel 
diseases (IBD) such as Chron's disease or colitis are established to 
be linked with viral hepatitis [40,41]. Also systemic lupus 
erythematosus has been found to be more prevalent in HCV 
infected patients [42]. Rheumatoid Arthritis, a common disease 



inducing inflammation in joints is also well-linked with HCV 
infection and people with HCV often show raised levels of 
rheumatoid factor in their blood [43]. Table 5 also reports some 
types of cancer to be associated with the proteins in the quasi- 
bicliques. Recent research has focused on development of cancer 
in HCV infected patients and different studies have established the 
links between hepatitis c and various types of cancers such as liver 
cancer [44], breast cancer [45], leukemia [46], colorectal cancer 
[47,48], endometrial cancer[47,48], and lung cancer [49]. Two 
bone related terms, bone mass and bone density are also reported 
in Table 5. Some studies have already shown that chronic HCV 
infection significandy reduces bone mineral density [50]. More- 
over, it has been found that HCV infection is a risk factor for bone 
fractures [51]. As depicted in the table, HCV infection has also 
been found to be associated with a higher risk of coronary diseases 
[52]. The above discussion indicates that many of the diseases 
reported in our study already have evidence in literature for their 
association with hepatitis C viral infection. Hence the quasi-cliques 
and quasi-bicliques obtained in our study may put light on the 
possible pathways of HCV pathogenesis leading to these diseases. 

Analyses of Gateway Proteins 

Previous results and discussions have pointed out two types of 
gateway proteins, one set acts as the gateway to the host cellular 
mechanism for the viral proteins, and the second set consists of the 
host proteins that have high degree of association to different kinds 
of diseases. The first set VH (Viral-Host) contains 15 host proteins: 
PSME3, TP53, TBP, TRADD, STAT3, HNRNPK, NR4A1, 
SETD2, PSMB9, TRAF2, STAT 1 , CALR, JAK1, VIM and 
UBQLN1 (Tables 2 and 3). The second set HD (Human-Disease) 
contains 7 host proteins PSMB8, PSMB9, TNFRSF1A, 
TNFRSF1B, TP53, MDM2 and EGFR. The results reveal that 
HCV infection pathogenesis should propagate through the 
proteins in VH and HD sets, and thus these proteins play 
extremely important role during viral infection. Specially, the 



Table 5. Quasi-bicliques found for human protein-disease association network corresponding to four quasi-cliques. 





Quasi-biclique Corresponding QC Human proteins 


Diseases 


Density 


Q6D1 QC1 Count: 2 


Count: 5 




PSMB8, PSMB9 


Graves disease, diabetes (type 1), interferon 
response, psoriasis, malaria; 
hypoglycemia; hyperparasitemia 


0.7000 


QBD2 QC2 Count: 2 


Count: 12 




TNFRSF1A, TNFRSF1B 


Crohn's disease, ulcerative colitis, cystic 
fibrosis, Lupus, Rheumatoid Arthritis, diabetes 
(type 2), amyloidosis, breast cancer, Tumor 
necrosis factor receptor-associated 
periodic syndrome, bone density, bone mass, 
obesity 


0.7083 


QBD3 QC4 Count: 2 


Count: 9 




TP53, MDM2 


DNA Damage | Lung Neoplasms, B-Cell 
Chronic Lymphocytic Leukemia, bladder 
cancer, breast cancer, colorectal cancer, 
endometrial cancer, liver cancer, lung 
cancer, stomach cancer 


1.000 


QBD4 CC8 Count: 2 


Count: 5 




EGFR, MDM2 


colorectal cancer, lung cancer. Acute 
Coronary Syndrome, Breast Neoplasms 
Carcinoma | Non-Small-Cell Lung | Exanthema | 
Lung Neoplasms 


0.7000 


The human proteins and diseases associated with each quasi-biclique are reported 
doi:1 0.1 371 /journal.pone.0094029.t005 


along with the densities of the quasi-bicliques. 
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Table 6. Significant GO-BP and KEGG pathway terms for viral- 
human gateway proteins. 



Significant GO-BP terms 

cytokine-mediated signaling pathway (p-value: 3.7e-5) 
regulation of apoptosis (p-value: 5.2e-5) 
regulation of programmed cell death (p-value: 5.5e-5) 
regulation of cell death (p-value: 5.6e-5) 

positive regulation of macromolecule metabolic process (p-value: 7.4e-5) 
Significant KEGG pathways 

Pancreatic cancer (p-value: 5.5e-4) 
Pathways in cancer (p-value: 5.6e-3) 



The significant terms are mentioned along with their significance p-values. 
DAVID online tool has been used to perform the significance tests. 
doi:1 0.1 371 /joumal.pone.0094029.t006 

proteins in the set VH are responsible for the initiation of the 
infection process. First we compare the average degrees of gateway 
and non-gateway proteins and found that average degree of 
gateway proteins is 2 1 .6364, whereas the average degree of non- 
gateway proteins is 4.2295. The difference is statistically significant 
as per Wilcoxon's rank sum test (p-value: 1.3006e-09). This 
suggests that the viral proteins tend to attack high-degree host 
proteins for initiating infection. Moreover, to test whether these 
proteins have some unique features, we investigate for their GO 
(BP) and pathway enrichment (Table 6). It is evident from the table 
that the significant GO-BP terms mostly involved in apoptosis and 
programmed cell death which indicates that the targeted host 
proteins are highly associated with the process of cell death. 
Moreover significant pathways suggest that HCV infection 
ultimately lead to various cancer types including pancreatic cancer 
which is already established in a recent study [53]. 

Conclusions 

In this article a system-wide study has been made for identifying 
possible infection pathway of hepatitic C virus. For this purpose, 
quasi-bicliques in HCV-human protein interaction network are 
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